load("~/Documents/Nonparametric Statisics/Project/clean data/functional/df_hour.RData")
fd_day <- fData(1:24,as_tibble(df_hour[,2:25]))
hours <- 1:24
n <- fd_day$N
x <- t(matrix(rep(hours,n),24,n))
y <- as.matrix(df_hour[,2:25])
k <- 2
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "pearson",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 20L
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 2
- Initial seeds for cluster centers: 3872 1366
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: pearson
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 20
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 1071
* Size of cluster #1: 5503
- Iteration #2
* Size of cluster #0: 2360
* Size of cluster #1: 4214
- Iteration #3
* Size of cluster #0: 2482
* Size of cluster #1: 4092
- Iteration #4
* Size of cluster #0: 2518
* Size of cluster #1: 4056
- Iteration #5
* Size of cluster #0: 2536
* Size of cluster #1: 4038
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
18.556 0.252 18.813
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:k,lwd = 3)
this is a clear distinction between working days and holydays.
I start with a permutation test on the global curves:
i1 <- which(fdakma0der$memberships==1)
n1 <- length(i1)
i2 <- which(fdakma0der$memberships==2)
n2 <- length(i2)
seed=7122023
B=10000
start by comparing group 1 and 2:
aug_df=y
n= n1 + n2
meandiff=(colMeans(y[i1,])-colMeans(y[i2,]))
plot(meandiff,type = 'l')
T0=sum(meandiff^2)
T0
[1] 2102.307
T0_perm=numeric(B)
for(perm in 1:B){
permutation <- sample(n)
df_perm=aug_df[permutation,]
perm_1 = df_perm[1:n1,]
perm_2 = df_perm[(n1+1):n,]
T0_perm[perm]=sum(((colMeans(perm_1)-colMeans(perm_2)))^2)
}
sum(T0_perm >= T0)/B
[1] 0
hist(T0_perm,xlim = c(0,1.2*T0))
abline(v=T0,col='green')
this are statistically different.
we now move to a local test.
we use the interval wise procedure, so we control the interval-wise error rate
between clusters 1 an 2
tst=IWT2(y[i1,],y[i2,],B=1000)
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
as expected the difference is both in the evening and in the morning.
if we augment the number fo clusters:
k <- 4
system.time(
fdakma0der <- fdakmeans(x = x,y = y, n_clusters = k,
seeds = sample(1:n,k),
warping_class = "none",
metric = "pearson",
centroid_type = "mean",
distance_relative_tolerance = 1e-3,
add_silhouettes = F,
parallel_method = 0L,
number_of_threads = 20L
)
)
Information about the data set:
- Number of observations: 6574
- Number of dimensions: 1
- Number of points: 24
Information about cluster initialization:
- Number of clusters: 4
- Initial seeds for cluster centers: 346 3809 3047 1834
Information about the methods used within the algorithm:
- Warping method: none
- Center method: mean
- Dissimilarity method: pearson
- Optimization method: bobyqa
Information about warping parameter bounds:
- Warping options: 0.1500 0.1500
Information about convergence criteria:
- Maximum number of iterations: 100
- Distance relative tolerance: 0.001
Information about parallelization setup:
- Number of threads: 20
- Parallel method: 0
Other information:
- Use fence to robustify: 0
- Check total dissimilarity: 1
- Compute overall center: 0
Running k-centroid algorithm:
- Iteration #1
* Size of cluster #0: 4158
* Size of cluster #1: 790
* Size of cluster #2: 1533
* Size of cluster #3: 93
- Iteration #2
* Size of cluster #0: 3830
* Size of cluster #1: 1028
* Size of cluster #2: 1462
* Size of cluster #3: 254
- Iteration #3
* Size of cluster #0: 3258
* Size of cluster #1: 1217
* Size of cluster #2: 1455
* Size of cluster #3: 644
- Iteration #4
* Size of cluster #0: 2730
* Size of cluster #1: 1260
* Size of cluster #2: 1601
* Size of cluster #3: 983
- Iteration #5
* Size of cluster #0: 2436
* Size of cluster #1: 1185
* Size of cluster #2: 1800
* Size of cluster #3: 1153
- Iteration #6
* Size of cluster #0: 2212
* Size of cluster #1: 1187
* Size of cluster #2: 1929
* Size of cluster #3: 1246
- Iteration #7
* Size of cluster #0: 2081
* Size of cluster #1: 1234
* Size of cluster #2: 1985
* Size of cluster #3: 1274
- Iteration #8
* Size of cluster #0: 2001
* Size of cluster #1: 1276
* Size of cluster #2: 2022
* Size of cluster #3: 1275
- Iteration #9
* Size of cluster #0: 1948
* Size of cluster #1: 1310
* Size of cluster #2: 2041
* Size of cluster #3: 1275
- Iteration #10
* Size of cluster #0: 1911
* Size of cluster #1: 1345
* Size of cluster #2: 2047
* Size of cluster #3: 1271
- Iteration #11
* Size of cluster #0: 1892
* Size of cluster #1: 1362
* Size of cluster #2: 2054
* Size of cluster #3: 1266
- Iteration #12
* Size of cluster #0: 1869
* Size of cluster #1: 1381
* Size of cluster #2: 2059
* Size of cluster #3: 1265
- Iteration #13
* Size of cluster #0: 1852
* Size of cluster #1: 1408
* Size of cluster #2: 2061
* Size of cluster #3: 1253
- Iteration #14
* Size of cluster #0: 1838
* Size of cluster #1: 1428
* Size of cluster #2: 2062
* Size of cluster #3: 1246
- Iteration #15
* Size of cluster #0: 1815
* Size of cluster #1: 1448
* Size of cluster #2: 2064
* Size of cluster #3: 1247
- Iteration #16
* Size of cluster #0: 1808
* Size of cluster #1: 1454
* Size of cluster #2: 2066
* Size of cluster #3: 1246
Active stopping criteria:
- The total dissimilarity did not decrease.
user system elapsed
75.334 0.565 75.885
autoplot(fdakma0der)
matplot(t(fdakma0der$center_curves[,1,]),type = 'l',
main='clustered and alligned curves',xlab='days',ylab='crashes', col = 1:k,lwd = 3)
this is a clear distinction between working days and holydays.
I start with a permutation test on the global curves:
i1 <- which(fdakma0der$memberships==1)
n1 <- length(i1)
i2 <- which(fdakma0der$memberships==2)
n2 <- length(i2)
i3 <- which(fdakma0der$memberships==3)
n3 <- length(i3)
i4 <- which(fdakma0der$memberships==4)
n4 <- length(i4)
doing a functional anova
groups <- fdakma0der$memberships
faov <- ITPaovbspline(y ~ groups)
[1] "First step: basis expansion"
Swapping 'y' and 'argvals', because 'y' is simpler,
and 'argvals' should be; now dim(argvals) = 24 ; dim(y) = 24 x 6574
[1] "Second step: joint univariate tests"
[1] "Third step: interval-wise combination and correction"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval Testing Procedure completed"
comparisons of each couple
tst <- IWT2(y[i1,],y[i2,])
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
tst <- IWT2(y[i1,],y[i3,])
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
tst <- IWT2(y[i1,],y[i4,])
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
tst <- IWT2(y[i2,],y[i3,])
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
tst <- IWT2(y[i2,],y[i4,])
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
tst <- IWT2(y[i3,],y[i4,])
[1] "Point-wise tests"
[1] "Interval-wise tests"
[1] "creating the p-value matrix: end of row 2 out of 24"
[1] "creating the p-value matrix: end of row 3 out of 24"
[1] "creating the p-value matrix: end of row 4 out of 24"
[1] "creating the p-value matrix: end of row 5 out of 24"
[1] "creating the p-value matrix: end of row 6 out of 24"
[1] "creating the p-value matrix: end of row 7 out of 24"
[1] "creating the p-value matrix: end of row 8 out of 24"
[1] "creating the p-value matrix: end of row 9 out of 24"
[1] "creating the p-value matrix: end of row 10 out of 24"
[1] "creating the p-value matrix: end of row 11 out of 24"
[1] "creating the p-value matrix: end of row 12 out of 24"
[1] "creating the p-value matrix: end of row 13 out of 24"
[1] "creating the p-value matrix: end of row 14 out of 24"
[1] "creating the p-value matrix: end of row 15 out of 24"
[1] "creating the p-value matrix: end of row 16 out of 24"
[1] "creating the p-value matrix: end of row 17 out of 24"
[1] "creating the p-value matrix: end of row 18 out of 24"
[1] "creating the p-value matrix: end of row 19 out of 24"
[1] "creating the p-value matrix: end of row 20 out of 24"
[1] "creating the p-value matrix: end of row 21 out of 24"
[1] "creating the p-value matrix: end of row 22 out of 24"
[1] "creating the p-value matrix: end of row 23 out of 24"
[1] "creating the p-value matrix: end of row 24 out of 24"
[1] "Interval-Wise Testing completed"
plot(tst)
the differences in the magnitudedd peaks are statistically significant.
still don’t know what to do.